23 research outputs found
Automating the Correctness Assessment of AI-generated Code for Security Contexts
In this paper, we propose a fully automated method, named ACCA, to evaluate
the correctness of AI-generated code for security purposes. The method uses
symbolic execution to assess whether the AI-generated code behaves as a
reference implementation. We use ACCA to assess four state-of-the-art models
trained to generate security-oriented assembly code and compare the results of
the evaluation with different baseline solutions, including output similarity
metrics, widely used in the field, and the well-known ChatGPT, the AI-powered
language model developed by OpenAI. Our experiments show that our method
outperforms the baseline solutions and assesses the correctness of the
AI-generated code similar to the human-based evaluation, which is considered
the ground truth for the assessment in the field. Moreover, ACCA has a very
strong correlation with human evaluation (Pearson's correlation coefficient
r=0.84 on average). Finally, since it is a fully automated solution that does
not require any human intervention, the proposed method performs the assessment
of every code snippet in ~0.17s on average, which is definitely lower than the
average time required by human analysts to manually inspect the code, based on
our experience
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators
AI-based code generators are an emerging solution for automatically writing
programs starting from descriptions in natural language, by using deep neural
networks (Neural Machine Translation, NMT). In particular, code generators have
been used for ethical hacking and offensive security testing by generating
proof-of-concept attacks. Unfortunately, the evaluation of code generators
still faces several issues. The current practice uses automatic metrics, which
compute the textual similarity of generated code with ground-truth references.
However, it is not clear what metric to use, and which metric is most suitable
for specific contexts. This practical experience report analyzes a large set of
output similarity metrics on offensive code generators. We apply the metrics on
two state-of-the-art NMT models using two datasets containing offensive
assembly and Python code with their descriptions in the English language. We
compare the estimates from the automatic metrics with human evaluation and
provide practical insights into their strengths and limitations
Enhancing Robustness of AI Offensive Code Generators via Data Augmentation
In this work, we present a method to add perturbations to the code
descriptions, i.e., new inputs in natural language (NL) from well-intentioned
developers, in the context of security-oriented code, and analyze how and to
what extent perturbations affect the performance of AI offensive code
generators. Our experiments show that the performance of the code generators is
highly affected by perturbations in the NL descriptions. To enhance the
robustness of the code generators, we use the method to perform data
augmentation, i.e., to increase the variability and diversity of the training
data, proving its effectiveness against both perturbed and non-perturbed code
descriptions
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation
Neural Machine Translation (NMT) has reached a level of maturity to be
recognized as the premier method for the translation between different
languages and aroused interest in different research areas, including software
engineering. A key step to validate the robustness of the NMT models consists
in evaluating the performance of the models on adversarial inputs, i.e., inputs
obtained from the original ones by adding small amounts of perturbation.
However, when dealing with the specific task of the code generation (i.e., the
generation of code starting from a description in natural language), it has not
yet been defined an approach to validate the robustness of the NMT models. In
this work, we address the problem by identifying a set of perturbations and
metrics tailored for the robustness assessment of such models. We present a
preliminary experimental evaluation, showing what type of perturbations affect
the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl.
Workshop on Natural Language-based Software Engineering (NLBSE) to be held
with ICSE 202
How future surgery will benefit from SARS-COV-2-related measures: a SPIGC survey conveying the perspective of Italian surgeons
COVID-19 negatively affected surgical activity, but the potential benefits resulting from adopted measures remain unclear. The aim of this study was to evaluate the change in surgical activity and potential benefit from COVID-19 measures in perspective of Italian surgeons on behalf of SPIGC. A nationwide online survey on surgical practice before, during, and after COVID-19 pandemic was conducted in March-April 2022 (NCT:05323851). Effects of COVID-19 hospital-related measures on surgical patients' management and personal professional development across surgical specialties were explored. Data on demographics, pre-operative/peri-operative/post-operative management, and professional development were collected. Outcomes were matched with the corresponding volume. Four hundred and seventy-three respondents were included in final analysis across 14 surgical specialties. Since SARS-CoV-2 pandemic, application of telematic consultations (4.1% vs. 21.6%; p < 0.0001) and diagnostic evaluations (16.4% vs. 42.2%; p < 0.0001) increased. Elective surgical activities significantly reduced and surgeons opted more frequently for conservative management with a possible indication for elective (26.3% vs. 35.7%; p < 0.0001) or urgent (20.4% vs. 38.5%; p < 0.0001) surgery. All new COVID-related measures are perceived to be maintained in the future. Surgeons' personal education online increased from 12.6% (pre-COVID) to 86.6% (post-COVID; p < 0.0001). Online educational activities are considered a beneficial effect from COVID pandemic (56.4%). COVID-19 had a great impact on surgical specialties, with significant reduction of operation volume. However, some forced changes turned out to be benefits. Isolation measures pushed the use of telemedicine and telemetric devices for outpatient practice and favored communication for educational purposes and surgeon-patient/family communication. From the Italian surgeons' perspective, COVID-related measures will continue to influence future surgical clinical practice